Local Feature Matching with SIFT

2 minute read

Published:

Understand and implement local feature matching with (simplified) SIFT algorithm

1. Main idea

  1. Local Feature Matching:
    1. Goal: find correspondence across images. Correspondence includes matching points, patches, edges. Instance-level matching: multiple vies of same physical scene.
    2. Applications: (1) Image alignment (2) 3D reconstruction
    3. Characteristics of a good feature:
      1. Repeatability: same feature can be found in several images despite (1) geometric transformations(angles) and (2) photometric transformations(light)
      2. Saliency: each feature is distinctive
      3. Locality: a feature occupies a relatively small area, robust to clutter and occlusion.
      4. Compactness and Efficiency: a fewer features than image pixels.
    4. Three major steps:
      1. Detect interest points 兴趣点
        1. Goal: find points that are distinctive rather than reapeatable.
        2. Characteristics of a good corner locations:
          1. Invariant: won’t change by the (1) geometric transformations(angles) and (2) photometric transformations(light)
          2. Covariant: will be at the same locations in both the two images.
        3. Harris corner detection 哈里斯角点检测
          1. Core idea: shifting a window in any direction should give a large change in intensity.Because usually corners have this property, it is also called Corner Detection.
          2. Math: calculate E(u,v) - the “error” of moving in direction (u,v).
            1. Each pixel error is square of intensity difference. [I(x+u,y+v)-I(x,y)]2
            2. Total is the sum of error of each pixel applied with weights(window function). So, E(u,v) = ∑w(x,y)[I(x+u,y+v)-I(x,y)]2
            3. Time complexity: O(window_width2 + shift_range2 + image_width2). Very slow.
            4. Reduce complexity of correlation error:
              • Use Taylor Series expansion to approximate I(x+Δu) as I(x) + ∇I(x)·Δu. We got E(u,v) = ∑w(x,y)[∇I(x)·Δu]2
              • It can be further converted into ΔuTAΔu, where A is the auto-correlation matrix自相关矩阵, which is a second moment matrix from iamge derivatives, A=M=equation. Thus the error equation is simplied to ΔuTMΔu.
              • Cornerness function, the two Eigen values of M will be used to decide how large the corner response is. If both eigenvalues are strong, then the Harrris value is large, means it has strong cornerness. The formula is har = det[M] - a[trace(M)2] = g(Ix2)g(Iy2) - [g(IxIy)]2 - a[g(Ix+g(Iy]2. g is the Gaussian function, to further enhance the difference from adjacent areas.
            5. What is second moment matrix 二阶矩矩阵?
            6. Why do we bother to use complicated Harris value, instead of measuring derivative on x and y directions? Because Harris measure difference on ALL directions. Otherwise a stright line x=y would satisfy the criteria.
          3. Window function: give more weights to the center of the area. Usually takes Gaussian distribution.
          4. Image changes that may/may not change Harris corner locations:
            1. Affine intensity change: partially invariant(invariant to translation, but partially changed by multiplication, as some of the points got bumped up above the threshold)
            2. Image translation: invariant
            3. Scale: not covariant(each specific point will become less corner)
            4. Image rotation: covariant(by using eigenvalues)
        4. Alternatives corner detection algorithms
          1. Maximally Stable Extremal Regions(MSER): based on watershed segmentation algorithm.
      2. Create local feature descriptors 局部特征描述子 (two different types)
        1. Goal: extract vector feature descriptor around interest point.
        2. SIFT vector descriptor
        3. Steps:
          1. Every corner point has 8x8 pixels. Each pixel has one directional pixel gradient.Each SIFT vector is 4x4
          2. We divide the 8x8 into 4 quadrants. Counts and create a histogram of gradients on the 8 directions.
          3. Reduce illumination effect.
            1. 128-dim vector normalized to 1.
            2. Clamp gradients > 0.2.
            3. Renormalized
        4. The final result of feature will be 4x4x8 = 128x1 dimension vector.
        5. Alternative feature descriptors
          1. Self-similarity descriptors
      3. Match feature vectors
        1. Goal: compare vector feature descriptor.
        2. Calculate the sum of square error of 128x1 feature vectors as distance.
        3. We will first get a set of possible correspondence candidates for each point.
        4. We cannot set a absolute value threshold, but use ratio of distance in the closest neighbors will be better.
  2. SIFT:
  1. 1

4. Pytorch

  1. 1